AITopics | learning rate schedule

Collaborating Authors

learning rate schedule

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

9d4c03631b8b0c85ae08bf05eda37d0f-Supplemental.pdf

Neural Information Processing SystemsFeb-9-2026, 13:15:59 GMT

architecture, duplicate, rate schedule, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Appendices for: Gradient-based Hyperparameter Optimization Over Long Horizons Paul Micaelli University of Edinburgh {paul.micaelli}@ed.ac.uk Amos Storkey University of Edinburgh {a.storkey }@ed.ac.uk

Neural Information Processing SystemsFeb-8-2026, 19:46:18 GMT

Now we return to the second part of (9). This illustrates how tight the upper bound is. We use a GeForce RTX 2080 Ti GPU for all experiments. Instead, we always carve out a validation set from our training set. Figure 1 The batch size is set to 128, and 1000 fixed images are used for the validation data. Here we provide the raw hypergradients corresponding to the outer optimization shown in Appendices: Figure 1.

artificial intelligence, hypergradient, machine learning, (11 more...)

Neural Information Processing Systems

Country: Europe > Sweden > Stockholm > Stockholm (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

We appreciate the reviewers' time and suggestions! We address them all and report new experimental results below. Although DIH can be helpful to identify noisy data in noisy-label setting (ref.Middle plot in Figure 1), DIHCL still achieves 90.34% test-set accuracy under 40% symmetric label noise on CIFAR10 (ref.Top plot in Figure 1). The statement may be revised that "updating in-6 Is the method specific to cyclic learning rate... DI-23 HCL is applicable to other learning rate schedules. We report the result of DIHCL with a piecewise exponential decay learning rate in Figure 1.

accuracy, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.79)

Add feedback

1ef91c212e30e14bf125e9374262401f-Supplemental.pdf

Neural Information Processing SystemsOct-2-2025, 09:51:48 GMT

artificial intelligence, machine learning, weight loss landscape, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

9d4c03631b8b0c85ae08bf05eda37d0f-Supplemental.pdf

Neural Information Processing SystemsAug-15-2025, 09:49:05 GMT

architecture, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Appendices for: Gradient-based Hyperparameter Optimization Over Long Horizons Paul Micaelli University of Edinburgh {paul.micaelli}@ed.ac.uk Amos Storkey University of Edinburgh {a.storkey }@ed.ac.uk

Neural Information Processing SystemsAug-14-2025, 16:12:26 GMT

hypergradient, hyperparameter, university, (9 more...)

Neural Information Processing Systems

Country: Europe > Sweden > Stockholm > Stockholm (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

A Proof of Lemma 4.2 554 Lemma A.1 (Restatement of Lemma 4.2)

Neural Information Processing SystemsAug-14-2025, 04:21:29 GMT

Lemma A.5 of [ 19 ] we have By substituting ( A.5) into ( A.1) we have, All experiments are conducted on a single NVIDIA V100. It runs on the GNU Linux Debian 4.9 operating The experiment is implemented via PyTorch 1.6.0. This makes the learning problem of CIFAR100 much harder. To demonstrate the fact that the over-fitting problem all comes from perturbation stability in Section 3.2(3), we We found this schedule is the most effective one when only training on the original CIFAR10. In this part, we provide a complete visualization for the two parts in Eqn. We test WideResNet-34 on CIFAR10 and CIFAR10.

artificial intelligence, machine learning, robust regularization, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.35)

Add feedback

Decoupled Relative Learning Rate Schedules

Ludziejewski, Jan, Małaśnicki, Jan, Pióro, Maciej, Krutul, Michał, Ciebiera, Kamil, Stefaniak, Maciej, Krajewski, Jakub, Sankowski, Piotr, Cygan, Marek, Adamczewski, Kamil, Jaszczur, Sebastian

arXiv.org Artificial IntelligenceJul-8-2025

In this work, we introduce a novel approach for optimizing LLM training by adjusting learning rates across weights of different components in Transformer models. Traditional methods often apply a uniform learning rate across all network layers, potentially overlooking the unique dynamics of each part. Remarkably, our introduced relative learning rates, RLRS, method accelerates the training process by up to $23\%$, particularly in complex models such as Mixture of Experts (MoE). Hyperparameters of RLRS can be efficiently tuned on smaller models and then effectively reused on models up to $27\times$ larger. This simple and effective method results in a substantial reduction in training time and computational resources, offering a practical and scalable solution for optimizing large-scale neural networks.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2507.03526

Country: